Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nwm client IndexError: invalid index to scalar variable. #180

Closed
aaraney opened this issue Feb 25, 2022 · 6 comments · Fixed by #182
Closed

Nwm client IndexError: invalid index to scalar variable. #180

aaraney opened this issue Feb 25, 2022 · 6 comments · Fixed by #182

Comments

@aaraney
Copy link
Member

aaraney commented Feb 25, 2022

Justin Hunter reported an issue when trying to retrieve a short range forecast using the nwm_client's gcp.NWMDataService. I verified that I can reproduce the issue locally.

Reproduce

pip install "hydrotools.nwm_client[gcp]"
pip list | grep nwm

hydrotools.nwm-client    5.0.1
from hydrotools.nwm_client import gcp
service = gcp.NWMDataService()
df = service.get(configuration="short_range", reference_time="20210101T01Z")

concurrent.futures.process._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/local/Caskroom/miniconda/base/envs/venv/lib/python3.8/concurrent/futures/process.py", line 239, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
  File "/usr/local/Caskroom/miniconda/base/envs/venv/lib/python3.8/concurrent/futures/process.py", line 198, in _process_chunk
    return [fn(*args) for args in chunk]
  File "/usr/local/Caskroom/miniconda/base/envs/venv/lib/python3.8/concurrent/futures/process.py", line 198, in <listcomp>
    return [fn(*args) for args in chunk]
  File "~/github/sandbox/test/venv/lib/python3.8/site-packages/hydrotools/nwm_client/gcp.py", line 274, in get_DataFrame
    scale_factor = ds['streamflow'].scale_factor[0]
IndexError: invalid index to scalar variable.
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "~/github/sandbox/test/venv/lib/python3.8/site-packages/hydrotools/nwm_client/gcp.py", line 429, in get
    return cache.get(
  File "~/github/sandbox/test/venv/lib/python3.8/site-packages/hydrotools/caches/hdf.py", line 93, in get
    df = function(*args, **kwargs)
  File "~/github/sandbox/test/venv/lib/python3.8/site-packages/hydrotools/nwm_client/gcp.py", line 353, in get_cycle
    df = pd.concat(dataframes)
  File "~/github/sandbox/test/venv/lib/python3.8/site-packages/pandas/util/_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "~/github/sandbox/test/venv/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 346, in concat
    op = _Concatenator(
  File "~/github/sandbox/test/venv/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 400, in __init__
    objs = list(objs)
  File "/usr/local/Caskroom/miniconda/base/envs/venv/lib/python3.8/concurrent/futures/process.py", line 484, in _chain_from_iterable_of_lists
    for element in iterable:
  File "/usr/local/Caskroom/miniconda/base/envs/venv/lib/python3.8/concurrent/futures/_base.py", line 619, in result_iterator
    yield fs.pop().result()
  File "/usr/local/Caskroom/miniconda/base/envs/venv/lib/python3.8/concurrent/futures/_base.py", line 437, in result
    return self.__get_result()
  File "/usr/local/Caskroom/miniconda/base/envs/venv/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
    raise self._exception
IndexError: invalid index to scalar variable.
@aaraney
Copy link
Member Author

aaraney commented Feb 25, 2022

From the stack trace, it appears that the metadata field, scaling_factor, for the streamflow variables in one of the NWM's channel_rt output files is not being deserialized as a collection (list, etc.) and instead is just a scalar variable (int, float, etc.). This may have been caused by a downstream change to a dependency (xarray, h5netcdf).

@aaraney
Copy link
Member Author

aaraney commented Feb 25, 2022

I was able to resolve this issue by removing the index in the scale_factor object.

line 274 python/nwm_client/src/hydrotools/nwm_client/gcp.py

            # Extract scale factor
            scale_factor = ds['streamflow'].scale_factor[0]

            # fixed with
            scale_factor = ds['streamflow'].scale_factor

I am assuming that the metadata layout of NWM channel route link files is pretty static over time as we've not seen this issue before. I assume this is a deserialization issue propagating from, if I had to guess, xarray.

It might be best if we push a hot fix that guards and type checks the scale_factor field while we track down and figure out what is causing this and determine a long term solution.

@aaraney
Copy link
Member Author

aaraney commented Feb 25, 2022

Found the issue. It is propagatingh5netcdf. Today they pushed 0.14.0 which introduced the following per their change log.

Return items from 0-dim and one-element 1-dim array attributes. Return multi-element attributes as lists. Return string attributes as Python strings decoded from their respective encoding (utf-8, ascii). By Kai Mühlbauer.

I verified that rolling the version back to 0.13.0 resolved this issue.

@aaraney
Copy link
Member Author

aaraney commented Feb 25, 2022

Now as to how we should proceed. I know previously I said:

It might be best if we push a hot fix that guards and type checks the scale_factor field while we track down and figure out what is causing this and determine a long term solution.

In this case, I think it makes sense to just type check ds.streamflow.scale_factor and handle the case where a scalar is returned. I dont want to force others to comply with a version pinning of h5netcdf. Thoughts @jarq6c?

proposed solution

streamflow = ds['streamflow']

# h5netcdf <= 0.13.0 always deserializes numeric attributes to numpy arrays.
# even if there will only be one item in the array.
if isinstance(streamflow.scale_factor, np.ndarray):
  scale_factor = streamflow.scale_factor[0]

# h5netcdf > 0.13.0 deserializes numeric attributes to numpy arrays if there is more than scalar in the attribute.
# otherwise, a  scalar numpy value is returned
else:
  scale_factor = streamflow.scale_factor

@jarq6c
Copy link
Collaborator

jarq6c commented Feb 25, 2022

If the source attribute was a single scalar all along and was only returned in a list because of some conceit of h5netcdf, I'm inclined to just drop the index and leave it at that. Is there a good reason to continue supporting h5netcdf <= 0.13.0?

@aaraney
Copy link
Member Author

aaraney commented Mar 1, 2022

After talking with @jarq6c offline, we came to a solution (please correct me where necessary @jarq6c). Given that h5netcdf==0.14.0 was released on 2022-02-25, we will pin the current version of nwm_client (5.0.1) to h5netcdf <= 0.13.0 and release the software as a post release to 5.0.1. Subsequently, nwm_client==5.0.2 will be released and pin h5netcdf >= 0.14.0. 5.0.2 will include a patch that resolves complies with h5netcdf >= 0.14.0.

aaraney added a commit to aaraney/hydrotools that referenced this issue Mar 1, 2022
jarq6c added a commit that referenced this issue Mar 16, 2022
Resolve Nwm client IndexError: invalid index to scalar variable. (#180)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants